Module - Geospatial Analysis

Task 2.1 - Apply geospatial visualisation tool (e.g. GeoPandas) on the dataset provided

Reading the dataset Excel File

Data Preprocessing

Use the GeoPandas or similar visualisation tool to plot a set of choropleth maps representing the world GDP per capita for the years 1995, 2005, and 2015 respectively

Reading the dataset provided by the Geopandas to get information about latitude and longitude to plot

Plotting choropleth map with 'gdp per capita of 1995' using Geopandas with legend

Interpretation

Plotting choropleth map with 'gdp per capita of 2005' using Geopandas with legend

Interpretation

Plotting choropleth map with 'gdp per capita of 2015' using Geopandas with legend

Interpretation

Task 2.2

2.2 Analyse the datasets and answer specific questions. For plotting within this section, you can use any visualisation tool.

Note

Preprocessing for task 2.2.1 to 2.2.4

Task 2.2.1

Note: I am doing preprocessing first and Plotting all the graphs at bottom of the file for better viewing

For year 2015, plot the GDP per capita for only the countries having population greater than 300000000. Very briefly interpret the generated plot.

Task 2.2.2

For year 2015, plot the GDP per capita for only the countries having population less than 70000000. Very briefly interpret the generated plot.

Task 2.2.3

For year 2015,GDP per capita for only the countries having gross GDP between 450000000000 US Dollar and 8920000000000 US Dollar.

Task 2.2.4

What is the percentage change in the GDP per capita from 1995 to 2015, for the country having the highest population in 2015?

Task 2.2.6.1

Present a correlation plot between mean population of each country and mean per capita GDP (from 1995 to 2015). Very briefly interpret the generated plot

Task 2.2.5

mean of per capita GDP (from 1995 to 2015) of all the countries.

Task 2.2.6.2

Graphs

Task 2.1.1

Interpretation

1.Only these three countries with population more than 300000000 according to 2015 data i.e 300 Million are shown 2.India and China have a much larger population than USA however the per capita GDP is too low for them

  1. USA has a yellow shade while India and China have a dark purple shade.

USA has higher GDP (52116.738813) and china is second (6500.281937) and last is India with 1751.664429

This plot compares the population of countries above population of 3 Million.

THE Population is concentrated in 2 countries

Task 2.2.2

Interpretation

  1. This shows countries with less than 70000000 population 2.Luxembourg,Switzerland,Ireland,Netherlands,Sweden and Qatar have a good GDP per Capita

  2. This plot shows that a small population is scattered in large regions of worls whereas India and China have huge population

Task 2.2.3

Interpretation

India,Nigeria,Iran, Islamic Rep,China, mexico are some among the lowest

Task 2.2.5

Interpretation for task 2.2.5

1.This plot shows Mean of GDP per Capita from 1995 to 2015

2.Most of countries maintained a high GDP-per-capita from 1995

3.UAE have got a dark shade of blue as UAE gained it's economic growth very fast in the next decade

Task 2.2.6

Interpretation

    This map shows the countries with their mean GDP per capita from 1995 to 2015 
    Most of the countries have maintained their GDP per Capita 
    While living standards have improved, the population have increased a lot 

This scatter plot shows how countries with more population have poor gdp-per-capita

Each country is represented by different dots of different color

The only exception is USA which has a high population and has a high gdp-per-capita

Most of countries with High GDP per Capita have low population

This plot shows the 4 maps which shows corelation of mean gdp-per-capita with population and within themselves as well

This scatter plot shows different contries on a horizontal plane. The size of ball is determined by it's GDP per capita

Interpretation

2.3 Social analytics

Goal: In this task, I am applying sentiment analysis to Twitter data using the Python libraries TextBlob and Tweepy

  1. Collecting 500 tweets on the topic, #Lockdown or #CovidLockdown with a Python script.
  2. Clean the tweets. Such as, removal of URLs from the tweets.
  3. Calculate the polarity values of the individual tweets
  4. Word cloud visualisation
  5. Analyse the public sentiments based upon the polarity values
  6. Final analysis (interpretation about the results and recommendation)

Setup

Import external libraries (thus verifying they are correctly installed)

Collect tweets using hashtag (#Lockdown)

Recent tweets matching a given query can be searched using the Twitter Search API; many libraries exist for Python. We see here how to obtain tweets using the TwitterSearch package, installable with pip install TwitterSearch; a Twitter account with an associated mobile number is needed in order to use the API

Creating a Twitter application

In order to use Twitter APIs you need API keys: follow these steps to obtain one

  1. Go to https://developer.twitter.com/en/apps and login with your Twitter account
  2. Click the "Create New App" button, fill the form with short descriptive values (you may use e.g. "http://example.com" as the URL) and confirm
  3. Click on the app you just created and open the "Key and Access Tokens" tab
  4. For better security, ensure to set the Access Level to Read-only
  5. Click on the "Create my access token" button below

You will need strings labeled with Consumer Key, Consumer Secret, Access Token and Access Token Secret shown in the page to use the API

Authenticating

Import the necessary classes and create a TwitterSearch object providing the API codes obtained above

We have some empty rows here so before further processing let's remove those.

The below command will remove all the rows with the Tweet column equals to "".

We can see that we have a calculated score for the subjectivity and polarity in our data frame.

Now let's build a function and categorize our tweets as Negative, Neutral and Positive.

And apply this function and create another feature in our data frame called Score.

Social media analytics Interpretation

1) To solve this part I choosen the hashtag #Lockdown

2) To collect the tweets from the twitter we need twitter acess token keys and secret keys.

3) Using above keys and by using tweepy API. I was able to collect the tweets

4) Once after collecting tweets I cleaned the tweets (removed URLs) as suggested by using python functions and later on by using Textblob library I calcluated the Polarity and subjectivity values

5) Plotted a word cloud as well as mentioned postive anb negative tweets

  1. People are useful to lockdown and willing to do work from home as of now
  2. Due to newly developed strain. Government announced a strict lockdown and due to this reason people may give postive tweets to this preventive measure using lockdown hashtag
As per my analysis people are feeling postive. But on other hand it is completely depends on lifesstyle and how each individual responds. But considering the transmission rate I feel covid lockdown is best in its own significant factors